sample efficiency
Simple random search of static linear policies is competitive for reinforcement learning
Model-free reinforcement learning aims to offer off-the-shelf solutions for controlling dynamical systems without requiring models of the system dynamics. We introduce a model-free random search algorithm for training static, linear policies for continuous control problems. Common evaluation methodology shows that our method matches state-of-the-art sample efficiency on the benchmark MuJoCo locomotion tasks. Nonetheless, more rigorous evaluation reveals that the assessment of performance on these benchmarks is optimistic. We evaluate the performance of our method over hundreds of random seeds and many different hyperparameter configurations for each benchmark task. This extensive evaluation is possible because of the small computational footprint of our method. Our simulations highlight a high variability in performance in these benchmark tasks, indicating that commonly used estimations of sample efficiency do not adequately evaluate the performance of RL algorithms. Our results stress the need for new baselines, benchmarks and evaluation methodology for RL algorithms.
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee
We propose stochastic ensemble value expansion (STEVE), a novel model-based technique that addresses this issue. By dynamically interpolating between model rollouts of various horizon lengths for each individual example, STEVE ensures that the model is only utilized when doing so does not introduce significant errors.
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.83)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.65)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Maryland (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Leisure & Entertainment (0.46)
- Energy > Power Industry (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
- Information Technology > Artificial Intelligence > Robots (0.93)
We provide a simple pseudo-2
We thank all the reviewers for their constructive comments. We will provide details in the final draft. MCUNet shows consistent improvement across different devices (F746, H743) and tasks (classification, detection). R1: Whether the overall network topology brings major improvement. R2: Why the auto-tuning in TVM fails to work on MCUs.
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Germany > Berlin (0.04)
- Europe > Poland > Masovia Province > Warsaw (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- Asia > China > Hong Kong (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (3 more...)